Ensemble of Neural Networks to Solve Class Imbalance Problem of Protein Secondary Structure Prediction
نویسندگان
چکیده
Protein secondary structures prediction (PSSP) is considered as a challenging task in bioinformatics. Many approaches have been proposed in last few decades in order to solve this problem. Despite the enhancements achieved, the prediction accuracy still remains limited. Accurate prediction of the secondary structure of proteins is a critical step in deducing tertiary structure of proteins and their functions. Among the proposed approaches to tackle this problem, artificial neural networks (ANN) are considered as one of the most successful methods that widely used in the field of PSSP. Recently, many efforts have been devoted to modify, improve and combine this technology with other machine learning methods in order to get better results. In this paper, we have proposed an ensemble method which combines the outputs of four feedforward neural networks. In each network one of the machine learning approaches has been applied in order to solve the class imbalance problem of protein secondary structure classification. The experimental results on RS126 data set show that our ensemble system has better performance compare to the best individual classifier. The results also reveal that the proposed system yields significant improvement in prediction accuracy of beta-sheet structure and a more balanced classification of three secondary structures.
منابع مشابه
Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملSolving Linear Semi-Infinite Programming Problems Using Recurrent Neural Networks
Linear semi-infinite programming problem is an important class of optimization problems which deals with infinite constraints. In this paper, to solve this problem, we combine a discretization method and a neural network method. By a simple discretization of the infinite constraints,we convert the linear semi-infinite programming problem into linear programming problem. Then, we use...
متن کاملTesting Ensemble Methods on Prediction of Protein Secondary Structure
The geometric opinion pool (GOP) ensemble method uses a multiplicative combination of predictors, and it is tailored to probability estimation in multi-class problems. This enables a decomposition of the KullbackLeibler entropy error function into an ambiguity term and an average error term. This can be used to estimate generalization error with a combination of cross-validation and estimation ...
متن کاملTraining Neural Networks for Protein Secondary Structure Prediction: The Effects of Imbalanced Data Set
Protein secondary structure prediction (PSSP) is one of the main tasks in computational biology. During the last few decades, much effort has been made towards solving this problem, with various approaches, mainly artificial neural networks (ANN). Generally, in order to predict the protein secondary structure, the ANN training process is performed using CB513 data set. Like protein structures d...
متن کاملProfiles and Majority Voting-Based Ensemble Method for Protein Secondary Structure Prediction
Machine learning techniques have been widely applied to solve the problem of predicting protein secondary structure from the amino acid sequence. They have gained substantial success in this research area. Many methods have been used including k-Nearest Neighbors (k-NNs), Hidden Markov Models (HMMs), Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), which have attracted atte...
متن کامل